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Abstract. Many functions have been recently defined to assess the similarity among 

O ■ networks as tools for quantitative comparison. They stem from very different frameworks 

r J ' - and they are tuned for dealing with different situations. Here we show an overview of the 

' spectral distances, highlighting their behavior in some basic cases of static and dynamic 

\0 synthetic and real networks. 
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Introduction 



Citing a comprehensive review |[T|, a complex network is a graph whose structure is 
irregular and dynamically evolving in time. In terms of architectures, Strogatz [2 1 used the 
O^. term "complex" to describe a network that is the counterpart of "regular" graphs (chains, 

grids, lattices and fully-connected graphs), the random graphs lying at the extremal edge 

fT^ . of the complexity spectrum. Network models from empirical studies lie somewhere in 

^ ' between regularity and randomness; although more often unbalanced towards the latter, 

CO , they can have to unexpectedly highly symmetric structures |3|. 

This article reviews and benchmarks a class of methods that tackle the problem of com- 
paring structure between networks. Structure and structural properties of networks have 
been studied in a wide variety of fields in science EUSJISIII, with methods ranging from 

^^ . statistical physics to machine learning fT^'Sl. Structural analysis is of central importance in 

(<— ^ ' computational biology |9|. Cootes pointed out that the comparison of biological networks 

can provide much more evolutionary information than studying each network separately 
ifTOl . Furthermore, the comparison of protein interaction networks can help designing 
models of cellular functions lTTl [T2ll . Comparison methods are essential with dynamic 

/\ ' networks to measure differences between two consecutive network states and then model 

the whole series. Comparison is also essential in network reconstruction (e.g. of gene reg- 
ulation networks) by structure reverse engineering starting from steady-state or time series 
data 1 13l [T4l[T5l . where performance has to be gauged against the ground truth of a real or 
simulated network. 

Our interest for network comparison is motivated by the study of network stability. 
On this less beaten path, only network robustness with respect to perturbations has been 
considered until now lfT6l[T7]| . It is envisioned that the choice of appropriate measures 
between networks would enable new model selection procedures such those available in 
molecular profiling for sets of ranked gene lists IITSlI . 

In this study, six candidate distances derived from the family of spectral similarity mea- 
sures are investigated for network comparison. After a first presentation of spectral mea- 
sures and alternatives in the rest of this introduction, a technical overview is provided in 
Sect. |2] and candidate measures are presented. Benchmark data and experiments devised 
to exemplify and compare the candidates are presented in Sect. [5] 
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Related Work. The basic goal of network comparison is quantifying difference between 
two homogeneous objects in some network space. The theory of network measurements 
relies on the quantitative description of main properties such as degree distribution and 
correlation, path lenghts, diameter, clustering, presence of motives I.19J . These and other 
properties have been described for complex networks in ||5] [l] and recently reviewed by 
MacArthur and Sanchez-Garcia fSOl. Furthermore, network measurements can be encoded 
into a feature vector, yelding a representation convenient for classification tasks IJ2T). 

The use of similarity measures on the topology of the underlying graphs defines a differ- 
ent strategy, whose roots date back to the 70's with the theory of graph distances (regarding 
both metrics inter- and intra-graphs 1221). Since then, a number of similarity measures have 
been introduced, including metrics relaxed to less stringent bounds. Cost-based functions 
stems from the parallel theory of graph alignment: the edit distance and its variants use 
the minimum cost of transformation of one graph into another by means of the usual edit 
operations - insertion and deletion of links. 

Feature-based measures are instead obtained when the similarity function is based on 
measurements feature vectors. One notable example in this family is the recently proposed 
use of C-functions for network volume measurements ll23l l24l . 

Finally, the label "structure-based" distance groups all other measures that do not rely 
on cost functions or characteristic features. A typical example are those measures based on 
functions of the maximal common subgraphs between the two networks, or those based on 
the common motifs fSSl, i.e. patterns of interconnections occurring in complex networks 
significantly more often than in randomized networks. Remarkably, equivalence of some 
structure-based distance and the edit distance has been proven |26|). Although in most 
cases only network topology is considered, measures were also introduced that deal with 
directed or weighted links: for an example of a generic construction and an application to 
biological networks, see ll27l . 

The family of spectral measures, which is investigated in this paper, is also part of the 
group of structure-based distances. Basically, it consists of a variety of maps of network's 
eigenvalues. The theory of graph spectra started in the early 50's and since then many of 
its aspects have been deeply mined, including a first classification of networks f2W\. The 
spectral theory has been applied to biological networks |29, 30| , where the properties of 
being scale-free (the degree distribution following a power law) and small-world (most 
nodes are not neighbors of one another, but most nodes can be reached from every other 
by a small number of hops or steps) are particularly evident. Estimates (also asymptotic) 
of the eigenvalues distribution are available for complex networks ||3TI . The idea of using 
spectral measures for network comparison is instead only recent and it relies on similarity 
measures that are functions of the network eigenvalues. However, it is important to note 
that, because of the existence of isospectral networks, all these measures are indeed dis- 
tances between classes of iosospectral graphs. An overview of the most common spectral 
similarity measures and of their basic properties is presented in the rest of this paper. 

1. Notations 

Formally, any network can be represented as a graph, a mathematical entity consisting 
of N nodes (vertices) and E edges (links or arrows) connecting pairs of nodes and repre- 
senting interactions (N e N U {c»}). Loops are allowed, i.e. an edge can link the same 
node to indicate self-interaction (some authors use the term pseudograph to indicate graph 
with loops). Edges can be bidirectional or unidirectional: in the latter case the graph is 
called directed (digraph, for short) and the edges are represented by arrows. Moreover, 
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Figure 1 . Network types 

edges can carry weights to indicate interaction intensity: in this case, the network is called 
weighted. More refined structures exist but they are not considered here. For instance: la- 
beled graphs, where functions from some subsets of the integers to the vertices (edges) of 
the graph identify classes of vertices (edges); hypergraphs, where an edge can connect any 
number of vertices; and multigraphs, where any numbers of edges between two vertices are 
allowed. For any network G, its topology consists of the set V{G) = {vi, . . . , v^} of its 



nodes and the set E{G) = {ei = {vi-^ ,Vj-^),- ■ ■ be — [vi 



, )} of its edges, neglecting 



weights and directions. Different types of graph sharing the same topology are displayed 
in Fig. [U 

A network, or graph, is characterized completely by its adjacency matrix A, i.e. an 
N X N matrix whose nonzero entries denote the various links between the graph's N 
nodes. Directions and weights are represented by the signs (or by asymmetricity) and 
values of the matrix entries. For the underlying topology (and thus for any unweighted 
undirected network), the adjacency matrix is symmetric and with entries in {0, 1}. The 
adjacency matrices for the weighted digraph in Fig. [T]and its topology are shown in Tab. 
[T] where nodes ordering is clockwise starting from the top node. This representation is not 
unique, in that it depends on the actual labeling of the nodes, and isomorphic graphs (iden- 
tical graphs with permuted labels) share the same adjacency matrix. Similarly, graphical 
representations are not unique too, since node placement is arbitrary. 

Table 1 . Adjacency matrices for the weighted directed network (two 
alternative matrices, with sign indicating direction or asymmetric, with 
the (positive) value only in entry {i,j) if i — ?> j) in Fig. [T]and its topol- 
ogy; nodes ordering is clockwise starting from the top node. 
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The degree (deg) of a vertex in an undirected graph is the number of edges touching the 
vertex itself, with loops (usually, but not for all authors) counted twice. The degree matrix 
is the diagonal matrix with the vertex degrees: for instance, for the network topology in 

Fig. [1] the degree matrix is D ~ \ ^ ^ . The Laplacian matrix of a graph is 

defined as the difference between the degree and the adjacency matrices: L = D — A. 
Thus, for an undirected and unweighted graph with no loops (a simple graphs), L has zero 
row/column sum. 

There exist at least two different normalized versions of the Laplacian matrix, namely 
C = D^2LD^2 = / — D^^ AD^2 and A — D^ CD^^ ^ where / is the identity matrix 
and L*^ 2 is the diagonal matrix with entries J^=. In terms of the degree, their entries 



can be explicitely written as: 

1 if J = j and deg^ ^0 (l if j = j and deg, ^ 

^-<!~ V''^g.deg, if *J is an edge A = J -g^- if zj is an edge 

otherwise [o otherwise 

The matrices £ and A are similar so they have the same set of eigenvalues (spectrum). 

The matrices A, L, £ and A are called connectivity matrices of the graph. An approach 
to connectivity matrices also in terms of the normalized Laplacian operators can be found 
inll32l[33l[34l. 

An undirected and unweighted graph has symmetric real connectivity matrices and 
therefore real eigenvalues and a complete set of orthonormal eigenvectors. Also, for each 
eigenvalue, its algebraic multiplicity coincides with its geometric multiplicity. Since A 
has zero diagonal, its trace and hence the sum of the eigenvalues is zero. Moreover, L is 
positive semidefinite and singular, so the eigenvalues are = no < fii < ■ ■ ■ < /x„_i and 
their sum (the trace of L) is twice the number of edges. Finally, the eigenvalues of £ lie in 
the range [0,2]. 

While the connectivity matrices depend on the vertex labeling, the spectrum is a graph 
invariant. Two graphs are called isospectral or cospectral if the corresponding connectivity 
matrices of the graphs have equal multisets of eigenvalues. Isospectral graphs need not 
be isomorphic, but isomorphic graphs are always isospectral. Network classification in 
terms of their spectrum is still an open problem 1351 [36l [37l : however, a first attempt to 
(qualitative) network classification in terms of graph spectra can be found in 1281 [381 by 
Banerjee. 

For an introduction to the theory of graph spectra, see |[39ll40ll4Tll . The relation between 
the spectral properties of the connectivity matrices and the structure and the dynamics of 
the networks are discussed in 11421 [32ll43l . 

2. Overview of spectral similarity measures 

In this section, we introduce a set of similarity measures based on the graph spectra that 
was recently proposed in literature, following an ideal chronological timeline. 

The first distance Dl (or, indeed, one-parameter family of distances) we are presenting 
is possibly the most natural one. Originally Dl was introduced as an intra-graph measure 
Il44ll45l and mentioned as an inter-graph distance by Pincombe |46 1, for evaluating changes 
in time-series of graphs. Let G, H be two graphs with N nodes and let {Aq = < Ai < 
• • • < Atv-i}, {ij- = < H < ■ ■ ■ < Mw-i} the respective Laplacian spectra. For an 
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integer A: < iV, the distance is defined as: 



(1) 



\ 



N-l 



E (A. 



fj-i 



N-l 



N-l 



i=N-k 



N-l 



if E ^? ^ E ^? 



E^? 

i=N-k 



i=N-k 



i=N-k 



dk{G,H)^ { 





JV-l 




E (^» - t^^f 




i=N~k 


N-l 


A 


Eff 


i=N-k 



N-l 



N-l 



if E ^? < E ^? 



i=N-k 



i=N-k 



The Dl measure is non-negative, separated, symmetric and it satisfies the triangle inequal- 
ity, SO it is a measure. 

A more refined spectral distance was defined as a step towards reconstructing a graph 
from its spectrum through a Metropolis algorithm |47|. The definition of the measure 
D2 follows the dynamical interpretation of a A^-nodes network as a A^-atoms molecules 
connected by identical elastic strings, where the pattern of connections is defined by the 
adjacency matrix of the corresponding network. The dynamical system is described by the 
set of A^ differential equations 



N 



^A,,(a 



for i = 0, • • • , A^ - 1 



i=i 



The vibrational frequencies uji are given by the eigenvalues of the Laplacian matrijy of the 
network: A^ = wf , with Xq = ujq = 0. The spectral density for a graph as the sum of 
Lorentz distributions is defined as 



N-l 



p{u) ^kY^ 



7 



where 7 is the common widtho and K is the normalization constant solution of 

p{uj)Alj = 1 . 

Then the spectral distance e between two graphs G and H with densities pg(w) and ph{^) 
can then be defined as 



(2) 



e{G,H) 



[pci'^) - Ph{^)] do; 



Note that two above integrals can be explicitely computed through the relation / ^da; = arctan(x). 

1 + x^ 



1 



A simpler measure D3 was introduced in B8I for graph matching, using the graph edit 
distance as the reference baseline. The authors compute the spectrum associated to the 



In 1391 . the Laplacian spectrum is called the vibrational spectrum. 

The scale parameter 7 which specifies the half-width at half-maximum (HWHM), equal to half the interquar- 
tile range. 
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classical adjacency matrix, Laplacian matrix, signless Laplacian matrix \L\ ^ D + A, 
and normalized Laplacian (£) matrix. They also introduce two more functions: the path 
length distribution and the heat kernel ht- The heat kernel is related to the Laplacian by 
the equation 

dht 

-dt ^ -"-"^^ ' 
so that 

Af-l 

ht{u,v) = ^ e~^'*(j)i{u)(l)i{v) , 

4=0 

where A^ are the Laplacian eigenvalues and (f>i the corresponding eigenvectors. For i — > 0, 
ht — ?► I — Lt, while when i — > oo then hf — >■ e~^'^-^*'(t>M~i^(t)N~i- By varying t different 
representation con be obtained, from the local {t — > 0) to the global (i — > oo) structure of 
the network. Moreover, if Dk{u, v) is the number of paths of length k between nodes u 
and V, the following identity holds: 

ht{u,v) = e"* ^ Dk{u,v)— , 

i=0 

which allows the explicit computation of the path length distribution: 

N-l 

Dk{u,v)= ^(l-A.O'=04H0.(t;). 

1=0 

The proposed distance is just the Euclidean distance between the vectors of (ordered) 
eigenvalues (for a given matrix M) for the two networks being compared: 



(3) dM{G,H) 



N-l 



\t 



T.{ 



^{G,M) _ ^iHM) 



2 



where A(t,m) are the eigenvalues of the graph T w.rt. the matrix M, where M is either a 
connectivity matrix, or the heat kernel matrix or the path length matrix. As a final observa- 
tion, the authors claim that the heat kernel matrix has the highest correlation with the edit 
distance, while the adjacency matrix hass the lowest. 

A similar formula D4 is proposed in |49| as the squared Euclidean (L2) between the 
vectors of the Laplacian matrix: 

N-l 

(4) d(G,iJ)=5:(Ap)-Af-^) 

i=0 

The next and last two measures are based on the concept of spectral distribution. 

The distance D5 is introduced in fSOl, aiming at comparing Internet networks topolo- 
gies. Let fx be the (normalized Laplacian) eigenvalued distribution, and /i(A) a weighting 
function and define a generic distance between graphs G and H as follows 

d^,p{G,H) = / fi{X) (/a,g(A) - /A.H(A))^dA . 



J\ 
The weighting function is then defined as /i(A) = (1 — A)'', an approximation of the graph 
irregularity as defined in |f39l, while the usual Euclidean metric is chosen, so that p — 2: 
the exact formula thus reads 

(5) d{G, H)= J{1- A)4 (/a,g(A) - h.HWf dA . 



AN INTRODUCTION TO SPECTRAL DISTANCES IN NETWORKS (EXTENDED VERSION) 7 

Calculating the eigenvalues of a large (even sparse) matrix is computationally expensive; 
an approximated version is also proposed, based on estimation of the distribution / of 
eigenvalues by means of pivoting and Sylvester's Law of Inertia, used to compute the 
number of eigenvalues that fall in a given interval. To estimate the distribution K equally 
spaced bins in the range [0, 2] are used, so that a weighted spectral distribution measure for 
a graph G can be defined for an integer n > as follows: 



(G)=^(l-fc)"/(A = fc) 



k£K 



The generic formula can be now specialized to: 



(6) d„(G, ff) = ^ (1 - fc)"(/G(A = fc) - fniX = k)f , 

keK 

a family of metrics parameterized by the integer N . 

The last spectral measure D6 in this review was presented in fSTI and it employ two dif- 
ferent divergence measures, Kullback-Leibler and Jensen-Shannon. The Kullback-Leibler 
divergence measure is defined on two probabiUty distributions pi, p2 of a discrete random 
variable X as 

Pi{x) 



KL{pi,p2) ^ ^pi(a;)log 



xex ^-^^ ' 

The Kullback-Leibler divergence measure is not a metric, because is not symmetric and it 
does not satisfy the triangle inequality. To overcome this problem, the author consider the 
Jensen-Shannon measure, which in some sense is the symmetrization of KL: 

Tc/ ^ It'T ( Pl+P2\ , 1„, / PI+P2 

JSbi,P2) = 2^^^ I Pi' 2 j 2 V^' 2 

With this definition, the square root of JS is a metric. Thus, if / is the (normalized Lapla- 
cian) spectral probability distribution, a distance between two networks can be defined as 



(7) rf(G,i/) = v/JS(/G,/H). 

Clearly, all the above distances D1-D6 suffer from the existence of isospectral graphs: 
they are relatively rare (especially in real networks) and qualitatively similar. For this 
reason, it would be more correct to call them distances between classes of isospectral 
networks. The six described distances are analytically summarized in Tab. |2] 

We conclude mentioning that spectrum of the graph can be indireclty used for assess- 
ing similarity |52|. The authors employ a seriation method based on graph spectrum to 
convert the graph into a string so to get a sounder basis for the graph edit distance compu- 
tation, aiming at the optimization of a function of the leading eigenvectors of the adjacency 
matrix. 

3. Benchmarking experiments 

In this section, we demonstrate the use of the distances in Tab. |2]in the comparison of 
network topologies in a controlled situation. To such aim, we constructed three synthetic 
benchmark datasets, detailed hereafter. All simulations have been performed within the R 
statistical environment ll53l . Throughout all simulations, we kept, for each distance, the 
parameter values as in the reference paper wherever possible, e.g., 7 = 0.08 for the scale 
of the Lorentz distribution in D2; the heat diffusion kernel in D3; the time t = 3.5 for the 
kernel in distance D3. For Dl we choose to use the [-jj largest eigenvalues. 
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Table 2. Spectral graph distances 



Distance 



Formula 



Equation Ref. 



\ 



N-l 



E (^^-mO' 



i=N~k 



N-1 

E^? 

i=N-k 



Dl dk{G,H) = { 





N-l 




E (^^-A^o' 




i=N-k 


N-1 


A 


T."^ 


i=N~k 



N-l 



N-l 



if E ^? ^ E ^? 

i=N-k i=N-k 



N-1 



N-1 



if E /^' < E ^' 

i=N-k i=N-k 



o m 



D2 



e{G,H)^JJ [pg{oj) - pH{uj)Ydij 



ill 



D3 



dM(G,i7) = 



7V-1 



\ 1=0 



© isi 



w-i 



D4 



D5e 



D6 



d{G,H)^Y.[^ 



4 = 



^(G,L) _y{H,L) 



d{G, H)= / (1 - A)4 (/;,,g(A) - fxM^f dA 



D5a d„(G, i/) = ^ (1 - fc)"(/G(A = fc) - /h(A = k)f 



keK 



d{G,H)^^]SifG,fH) 



® 


l|49| 


© 


||50| 


© 


m 


© 


m 



3.1. Data Description. The simulated topologies are generated within the R statistical 
environment |53| by means of the simulator provided by the package netsim 1,541 |55l . 
producing networks that reproduce principal characteristics of transcriptional regulatory 
networks. The simulator takes into account the scale-free distribution of the connectivity 
and constructs networks whose clustering coefficient is independent of the number of nodes 
in the network. All random graphs are generated by keeping the default values of netsim 
for the structural parameters. 

In the first experiment we consider a random network ^ on A^ vertices and we compare 
it with the full connected network with the same number of nodes F, the complemental 
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Figure 2. Benchmark Dataset Bi{h, 25, 5); the original graph A, the 
perturbed graph Ac,, the comple mental graph A and the fully connected 
graph F. 



network A and a matrix Ap obtained from A by modifying (inserting/deleting) about the 
p% of the nodes. For smoothing purposes, the process is repeated h times to obtain the 
first benchmarking dataset Bi (6, N, p). An instance of this benchmark dataset is shown in 
Fig|2] In Tab. [3] we show the average on 6 = 50 instances of the number of nodes of the 
starting matrix A and the perturbed matrix A^ . Because of the small number of links in 
the original matrix, the 5% perturbation mostly reflects in links insertion. On average, the 
density of the original graph A can be expressed by the relation I ~ 1.7 N — 5, where I is 
the number of links and N the number of vertices. 

In the second experiment we simulate a time-series of T networks on N nodes starting 
from a randomly generated graph Si, where each successive element Si of the series is 
generated from its ancestor 5i_i by randomly modifying p% of the links. Again 6 = 50 
instances of the series are created and collected into the second benchmarking dataset 
62(6, T, N,p). With this strategy, the number of existing links is increasing with the series 
index, being the original adjacency matrix almost sparse. The starting matrix 5*1 has on 
average 38.1±5.2 nodes, while the last element of the series 6*20 has 132.3±8.2. Three 
elements of this benchmark dataset are shown in Fig 15] 

The third experiment is based on a benchmark dataset 163(6, T, N, nd, no). Starting 
from B2{b,T,N,p), different perturbations are applied: each successive element Si of 
the series is generated from its ancestor Si-i by randomly deleting nd links and adding 
na links. By construction, the number of existing links for all elements of the series is 
constant. Three elements of .63(6, 20, 25, 5, 5) are shown in Fig|4] 



Table 3. Number of links in the original matrix A, in the fully con- 
nected matrix F (maximum number of links for the given dimension) 
and in the perturbed matrix A5, expressed as mean ± standard deviation 
on 50 replicates. 
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10 


45 


13.4±2.0 


13.1±2.3 


20 


190 


29.0±3.6 


36.6±5.2 


50 


1225 


79.3±7.4 


131.8±4.2 


100 


4950 


164.5±13.6 


388.2±12.1 



10 
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Figure 3. Benchmark Dataset 62(6, 20, 25, 5): the original graph 5*1 
(first element of the series), the tenth element Siq of the series and the 
final graph 6*20 • 



3.2. Results. InExp. 1 thesixdistancesDl-D6wereappliedon4instancesofSi(50, A^,5) 
for N = 10, 20, 25, 100 and distances between the original graph A and the three compan- 
ion matrices F, A and Ap were computed. Results are collected in Tab. HI 

Distance D4 spans a considerably wider range than other measures, due to the absence 
of the square root in the comparison of the Laplacian spectra, while D5 is restricted into a 
very small interval. The same distance D4 also shows a high dependency on the dimension 
of the considered matrices and the number of the links (see Tab. |4|l. 

The best stability in terms of the relative standard deviation 17//X is reached by D2 and 
D4. Furthermore, D2, differently from all other measures, is almost independent of the 
number of vertices. Finally, D6 is the only measure that, in the cases with N > 10, gives 
a lower distance for F than for A. 

The summary plots in Fig. |5] display results of Exp. 2 on the benchmark dataset 
;B2(50, 20, 25, 5). Distances between consecutive elements {Si,Si+i) of the series (de- 
fined Step i) were computed: results are averaged on the 50 replicates. For all D1-D6, 
distance decreases for increasing steps, although on different ranges (as already pointed 
out for Experiment 1) and with different widths for the confidence intervals. D3 and D5 
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Figure 4. Benchmark Dataset ^3(6, 20, 25, 5, 5): the original graph 5*1 
(first element of the series), the tenth element Siq of the series and the 
final graph 520- 
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Table 4. Results of the experiments on the first benchmarking dataset. 
For each measure D1-D6 and number of network vertices N, the values 
are reported of the distances between the network A and the networks 
A^, A and F in terms of the minimum (m), mean (/i) ± standard devia- 
tion and maximum (M) on the 50 replicates. Values of D5 are in 10^'^. 
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0.025 


0.108 ±0.053 
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0.982 ± 0.383 
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1.324 ±0.350 


1.811 


10 
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0.067 ± 0.074 


0.294 


0.006 
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10 
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0.941 ± 0.603 


1.844 


0.092 


3.635 ± 2.340 


8.907 
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4.112 ±2.306 


9.491 


10 
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0.102 


0.169 ± 0.039 


0.259 


0.192 


0.386 ± 0.084 


0.507 


0.431 


0.507 ± 0.04 


0.552 


20 
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0.037 


0.194 ±0.069 


0.342 


2.117 


2.768 ± 0.379 


371 


2.455 


3.038 ± 0.372 


4.006 


20 
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0.202 


0.284 ± 0.049 


0.381 


1.025 


1.091 ±0.034 


1.165 


1.538 
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Figure 5 . Plots of the distances of consecutive elements of the series of 
the dataset ,62 (50, 20, 25, 5). Solid line: mean over the 6 = 50 replicates; 
dashed lines: Icr standard deviation confidence intervals. 



decrease more quickly for initial steps, so they are less useful when comparing large net- 
works. 
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To better highlight similarities and differences among the distances regardless of their 
ranges of values, we also computed their mutual correlations and plotted the mutual scatter 
plots in Fig. |6] All correlation values are quite high, ranging from 0.8225 to 0.9970: D3 
and D5 are mutually strongly correlated, but they tend to separate from the other distances, 
as evidenced both from the global correlation values and the scatter plot profiles distancing 
from the panel diagonals. 
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Figure 6. Mutual scatterplots (upper triangle) and correlation values 
(lower triangle) for the Exp. 2. 



The Experiment 3 was performed on the benchmark dataset ,83(50, 20, 25, 5, 5), and the 
results are reported in two figures matching those of Exp. 2. Since the difference between 
consecutive pairs of elements of the series is quite similar throughout all the steps, as 
expected all distances show a nearly constant trend as shown in Fig. |7] 

The obscillations around the mean value are nevertheless strongly varying among dif- 
ferent measures, as evidenced by Fig. [8] In particular, distance D3 is antic orrelated to all 
distances but D5; furthermore only in 4 cases out of 15 we obtain a correlation value higher 
than 0.7, with again D\, D2, DA and D6 forming a group of more similar behaviour 

Possible hierarchy of the six distances was explored by clustering. Two dendrograms 
are built for Exp. 2 and Exp. 3 by using the hclust package in R and shown in Fig. |9l 
The clusters have average linkage and the correlation distance cd(-, •) = 1 — Corr(-, •) 
is used as the dissimilarity measure. Although there is an appreciable coherence among 
measures on macroscopic trends, when downscaling to microscopic trends correlations get 
much looser Distances D\, D2, DA, D6 seem to group together, while D3 has a more 
erratic behaviour. Finally, a wide range difference occurs in the cluster heights between the 
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Figure 7. Plots of the distances of consecutive elements of the series 
of the dataset ^3(50, 20, 25, 5, 5). Solid line: mean over the 6 = 50 
replicates; dashed lines: la standard deviation confidence intervals. 

two experiments: the homogeneous macroscopic situation of Exp. 2 has a narrower height 
span than the microscopic case in Exp. 3. 

4. A REGULATORY NETWORK EXAMPLE 

To conclude with, we apply D1-D6 to three different perturbations of the transcriptional 
interactions networlo in Escherichia coli, described in 1*561 and shown in Fig. [Tol 

The transcriptional database contains 577 interactions between 116 TFs and 419 oper- 
ons. Starting from an existing database (RegulonDBJj), the authors added 35 new TFs, in- 
cluding alternative sigma factors, and over a hundred new interactions from the literature. 
The original adjacency network (without self-interactions) consists of 420 vertices and 519 
(undirected) links. To show the influence on distances, we compare the distances between 
the original network and the three networks obtained by silencing out (thus deleting the 
link involving such vertex) the activator/repressor factor crp and the two repressor factors 
frn and himA, having respectively 72, 22 and 21 links. In Tab. |5]we list the value of the dis- 
tances between the original network EC and its three perturbations, denoted respectively 
as ECcrp, ECj^ and EC^;^^. 

All distances seem to be heavily dependent on the number of removed links: for all six 
distances, 

DiEC.ECarp) > DiEC,ECj^),DiEC,ECj^^;;^) . 

Nevertheless, when the number of removed links are almost equal, such relation is not 
valid anymore. 



Publicly available at |http: // www.weizmann.ac. il/mcb/UriAlon/Network_motif s_in_coli/ColiNet-l . 1/| 
thttp: //regulondb. ccg.unam.mx/| 
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Figure 8 . Mutual scatterplots (upper triangle) and correlation values 
(lower triangle) for the Exp. 3. 



Experiment 2 



Experiment 3 



Figure 9. Cluster dendrograms with average linkage and correlation 
distance of D1-D6 for the two Experiments 2 and 3. 

The distance D{EC,EC^^j^ is comparable to D{EC,ECj;^) for D = 01,04, 
while the former is much bigger than the latter for all other distances. In fact, 



DiEC,ECj^^) 
0(EC,ECj^) 



2.8 forD2 

4.8 for D6 

27 for D3 

35 for D5 
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Figure 10. Transcriptional interactions network for Escherichia coli, 
with edges relative to gene crp marked in black. 

Table 5. Distances between EC and the perturbed networks ECcrp, 

ECj^ and ECj^^. 



Network 



Links 



Dl 



D2 D3 



D4 



D5 



D6 



1.01178-10--^ 
0.01256-10--'' 
0.43315-10-3 
0.82079-10-3 
0.26794-10-3 
0.30303-10-3 



(EC, ECcrp) 

(ECECj^) 
(EC,ECj,^) 
(ECcrp,ECj:^) 

(ECcrp,ECj;j:^ 



519 vs 453 
519 vs 497 
519 vs 498 
453 vs 497 
453 vs 498 
497 vs 498 



0.418 
0.058 
0.056 
0.557 
0.557 
0.023 



0.085 
0.023 
0.065 
0.074 
0.072 
0.071 



8.711 
0.191 
5.187 
6.938 
0.982 
3.730 



2191.9 

41.4 

44.4 

2140.9 

2138.1 

10.2 



0.555 
0.083 
0.404 
0.479 
0.180 
0.357 



For instance, the corresponding ratios D{EC, ECcrp)/ D{EC, ^ ^himA ) ^^ much smaller, 
namely 

D%{EC,ECarp) 



mEC,ECj—j 



1.4 and mEC,EC^) ^ 
Di{EC,ECj—j 
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Table 6. Natural logarithm of the size of the automorphism group of 
the original and the perturbed networks 



Network G 



EC 



log(|Aut(G)|) max{A,} 



330.0173 


73.021 


377.5827 


27.015 


341.4692 


73.019 


347.4488 


73.020 



A possible explanation is in the quite different structure of the two networks EC-r^ and 
^ ^himA ' although being obtained silencing out almost the same number of links from the 
original network. 

The intrinsic structural difference between ECj:^ and E Cf^-^^ is indeed highlighted 
by the remarkable variation in the size of the respective group of automorphisms as shown 
in Tab. |6] For instance, the structure of ECj^^;^ is almost e347.4488-34i.4692 ^ 4qq ^^^^^ 
more symmetric than ECj^. From this point of view, spectral distances can greatly help 
in analyzing subtle differences between networks where more classical methods are not 
helping much. As an example, the leading Laplacian eigenvalue is commonly used when 
network structure, because it is a good indicator of the stability and the local dynamics 
||57J . For instance, in this particular example, this value is of no help, as indicated in Tab. |6] 
since EC, EC-i^ and ECj^j;;^ have essentially the same leading eigenvalue; nevertheless 
the spectral distances, encoding information coming from the whole spectrum, can better 
separate very similar networks. Summarizing the observations following the experiments 
on synthetic data and the results in Tab. |5] we can conclude proposing D2 as the more 
reliable metric, both in terms of stability and robustness in terms of being less prone to odd 
behaviours. 
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